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Abstract — Recently, the precise performance of the General¬ 
ized LASSO algorithm for recovering structured signals from 
compressed noisy measurements, obtained via i.i.d. Gaussian 
matrices, has heen characterized. The analysis is based on a 
framework introduced by Stojnic and heavily relies on the use 
of Gordon’s Gaussian min-max theorem (GMT), a comparison 
principle on Gaussian processes. As a result, corresponding 
characterizations for other ensembles of measurement matri¬ 
ces have not been developed. In this work, we analyze the 
corresponding performance of the ensemble of isotropically 
random orthogonal (i.r.o.) measurements. We consider the 
constrained version of the Generalized LASSO and derive a 
sharp characterization of its normalized squared error in the 
large-system limit. When compared to its Gaussian counterpart, 
our result analytically confirms the superiority in performance 
of the i.r.o. ensemble. Our second result, derives an asymptotic 
lower bound on the minimum conic singular values of i.r.o. 
matrices. This bound is larger than the corresponding bound 
on Gaussian matrices. To prove our results we express i.r.o. 
matrices in terms of Gaussians and show that, with some 
modifications, the GMT framework is still applicable. 

I. Introduction 

A. Setup 

Consider the classical problem of signal reconstruction of 
a structured signal Xq G R" from linear compressed and 
noisy measurements y = Axq + z € R™. Here A is the 
measurement matrix with compression rate m/n < 1 and 
z is the noise vector. A standard method for recovering Xq 
is to solve a convex optimization program that enforces our 
prior knowledge about the distribution of the noise vector 
and the structure of the unknown signal. We model z as a 
zero-mean Gaussian vector with covariance matrix ct^I. Also 
assume / : R” —>■ R to be a convex function that induces 
the structure of Xq, e.g. £i-norm for sparsity, nuclear norm 
for low-rankness, etc.. A popular algorithm in this direction 
is the Generalized f^-LASSO that solves 

x'^ :=argmini||y-Ax||f-bA/(x), (1) 

X 2 

for a regularization parameter A > 0. We measure the per¬ 
formance of (1) with the Normalized Squared Error (NSE): 

NSE(a):=||x'^-Xo||2/a^ (2) 

and are interested in characterizing its behavior as a function 
of n, m, /, Xq, tr and A. To get a handle on this question, 
it is common to model the sampling matrix A as chosen at 
random from some ensemble. In particular, two prominent 
models for the measurement matrix are: 


(a) Gaussian : The entries of A are i.i.d. standard normal. This 
assumption is primarily motivated by: (i) the well-understood 
and remarkable properties of the gaussian ensemble, (ii) the 
so-called universality property, i.e. many results turn out to 
hold true for matrices with i.i.d. entries drawn from a wide 
class of probability distributions. 

(b) Isotropically Random Orthogonal (i.r.o.): The matrix A 
is sampled uniformly at random from the manifold of row- 
orthogonal matrices satisfying AA^ = !„. Such orthogonal 
matrices are occasionally referred to as being “Haar dis¬ 
tributed”. Matrices with orthogonal rows are often preferred 
in practice because their condition number is one and the do 
not amplify the noise. As a result they have superior noise 
performance, something we shall also observe in this paper 
too. Eurthermore, certain classes of orthogonal matrices, 
such as Eourier, discrete-cosine and Hadamard allow for fast 
multiplication and reduced complexity. 

B. Background 

Understanding the reconstruction performance of (1) has 
been a subject that has attracted enormous research attention 
over the past two decades or so. However, it is only recently 
that precise analysis in the noisy case has been developed. 

1) Noiseless Case: In the noiseless case it has been shown 

[1] that the unique solution to min{x|y=Ax} /(x) is the true 
vector Xq if the number of measurements m satisfy 

W>W/,xo- (3) 

Here, luJ is a geometric measure of the complexity of / 
and Xq, defined in Section II. (3) is precise in the sense 
that the same number of measurements is also necessary 

[2] . This result is universal over the measurement matrix A 
over both the Gaussian and the i.r.o. ensemble: A appears in 
the optimality conditions only through its nullspace, which 
in both cases is an isotropically random subspace in R" of 
dimension n — m. 

2) Noisy Case: Most results in the noisy case are order- 
wise in the sense that they hold only up to unknown 
numerical constants. 

Gaussian Ensemble : Precise bounds on the NSE of the Gen¬ 
eralized LASSO with Gaussian measurements have appeared 
only very recently. To the best of our knowledge, the first 
such results appear in [3], [4] when ii regularization is 
used in (1). More recently, Stojnic introduced in [5] a novel 
framework, which is based on the use of Gordon’s Gaussian 
min-max Theorem (GMT) [6, Lem. 3.1]. The framework has 





proved to be powerful (also, see [7]) and has resulted in 
simple, yet precise bounds on the NSE of the Generalized 
LASSO [5], [8]-[10]. Those results resemble (3) for the 
noiseless case. To get a flavor, consider the constrained 
version^ of (1) (C-LASSO) which solves 


this establishes rigorously the conjecture raised in [13]. Our 
second result in Theorem 2.2 derives a high-probability lower 
bound on the mCSV of i.r.o. matrices. The bound is seen to 
exceed the corresponding well-known bound for Gaussian 
matrices. 


X := argmin ||y - Ax ||2 subject to /(x) < /(xq). (4) 

X 


It was shown in [5], [8] that the NSE of (4) under Gaussian 
measurements is upper bounded by 


UJ 


2 

/:Xq 


m — uj 


2 

/,Xo 


(5) 


The bound is precise since it is shown to be achieved with 
equality in the limit cr —0. 


I.R.O. Ensemble : Unlike the noiseless case, in the noisy 
setting i.r.o. matrices exhibit different recovery performance 
than that of Gaussians. Using the replica method from 
statistical physics and through extensive simulation results, 
[11], [12] derive expressions that characterize the NSE of (1) 
and report that orthogonal constructions provide a superior 
performance compared to their Gaussian counterparts. As 
mentioned in [11], even though it provides a powerful tool 
for tackling hard analytical problems, the replica method 
still lacks mathematical rigor in some parts [11]. As a 
follow up to these reports, and also driven by the fact 
that orthogonal constructions are easier to implement in 
practical applications [12], it is of interest to prove precise 
bounds on the achieved NSE; ones that would resemble those 
of [5], [8], [10] for Gaussian constructions. Towards this 
direction, Oymak and Hassibi showed in [13] that the noisy 
performance of i.r.o. matrices is at at least as good as that of 
Gaussians. To conclude this, they proved that the minimum 
conic singular value (mCSV) of the former can be no smaller 
than that of the latter. mCSVs appears naturally as a measure 
of noise robustness performance (e.g. [1, Cor. 3.3]), thus, 
the achieved NSE of i.r.o. can be no worse than that of 
Gaussians. Adding to this, [13] conjectures a formula to 
bound the NSE of (4) when A is i.r.o.. 


C. Contribution 


We prove in Theorem 2.1 that when the measurement 
matrix A is i.r.o., then the NSE of (4) in the high-SNR 
regime (cr —>■ 0) behaves precisely as^; 




V.XQ 


U £ 

/■xo 


(6) 


As is the case for the Gaussian ensemble (cf. (5)), we 
conjecture this to be the worst-case value of the NSE over 
all cr. Since n — tu'j < n, when compared to (5), our 
result implies the superiority in performance of the i.r.o. 
ensemble when compared to the Gaussian one. In particular. 


'From Lagrange duality there exists value of A in (1) such that the two 
versions are equivalent. 

^(6) holds for i.r.o. matrix A scaled such that AA^ = nlm- This is to 
allow for a fair comparison with i.i.d. standard Gausian matrices for which 
E[AA^] = nlm. 


D. Approach 

The set of techniques available for dealing with i.r.o. 
matrices is limited compared to the variety of methods 
available for working with Gaussian matrices. Nonetheless, 
we are able to prove (6) based on a modification of the 
same framework [7] that led to corresponding results for 
the Gaussian case [5], [8]-[10], [14]. As mentioned, the 
framework builds upon the GMT, a comparison lemma on 
Gaussian processes. In particular, [5], [8] use the fact that 
||a ||2 = max||u||<i u^a to write (4) as: 

min max u^(y — Ax) subject to /(x) < /(xq), (7) 

X ||u||2<l 

to which GMT is directly applicable. In contrast, when A 
is i.r.o., it is not at all obvious how to use GMT. To start 
with, there is no Gaussian matrix. The key idea here is to 
equivalently express an i.r.o. matrix as: 

with G € having entries i.i.d. standard Gaussian and 

where (GG^)“^/^ is the inverse of the square-root of the 
positive definite (with probability one) mxm matrix GG^. 
Substituting in (4), the LASSO objective is closer but not 
yet quite of the form required by GMT. In particular, the 
slick trick that led to (7) is not enough here and additional 
ideas are required. Using these we are able to bring (4) into 
the desired format; the argument is sketched is Section III-C. 
Once this is done, what remains is to apply the framework 
of [7] to conclude with the desired. 

IT Result 

A. Setup 

Let Xq G i?", y = Axq + av € R™ and convex f : 
R" R. The constrained Generalized LASSO (C-LASSO) 
solves (4). The reconstruction vector x depends explicitly on 
A,/, Xq, and, implicitly on cr, v through the measurement 
vector y. Define the NSE of (4) as in (2). 

1) Assumptions: The matrix A G < n is 

modeled to have orthogonal rows AA^ = and the joint 
probability density of its elements remains unchanged when 
A is pre- and post- multiplied by any orthogonal matrices 
# G G i.e., p(^A0) = p(A). We 

say that A is i.r.o.^. The noise vector v has entries i.i.d. 
standard normal Af{0, 1), / : M" —R is assumed convex 
and continuous, and, Xq is not a minimizer of /. Popular 
regularizers include the £i-norm, nuclear-norm, £ 12 -norm 
etc. (please refer to [1], [2] for further examples). 

^Different terminologies that appear in the literature to describe the same 
distribution include “random m-frames in and “distributed according 
to the Haar measure on the Stiefel manifold”, see [15]. 





2) Large system limit: Our results hold in an asymptotic 
regime in which the problem dimensions grow to infinity. We 
consider a sequence of problem instances {A, v, Xq,/} m.n 
as in (4) indexed by m and n such that both m,n ^ oo. In 
each problem instance. A, v and / satisfy the assumptions 
of Section II-A.l. Furthermore, x and NSE((t) denote the 
output of (4) and the corresponding NSE. To keep notation 
simple, we avoid introducing explicitly the dependence of 
variables on the problem dimensions m, n. 

3) NSE: Define the worst-case and asymptotic NSE as 
wNSE := supg.>Q NSE(cr), and aNSE := limo— >0 NSE(cr), 
respectively. The importance of studying the aNSE stems 
from the fact that wNSE = aNSE in several cases (in¬ 
cluding C-LASSO for Gaussian measurements, also see [8], 
[10], [16]). Theorem 2.1 precisely characterizes aNSE. We 
conjecture that the same expression predicts wNSE. 

4) Terminology: 

Definition 2.1 (Tangent cone): Consider / : R" —?► R, 
Xq G R" and its set of descent directions 22 /(xq) := {w G 
R"|/(xo + w) < /(x)}. The tangent cone of / at xq is 
defined as T/(xo) := Cl(cone(I2/(x)). , where cone(-) 
and Cl(-) return the conic hull and the closure of a set, 
respectively. 

Definition 2.2 (Gaussian width): Let h G R" have i.i.d. 
standard normal entries. The Gaussian width of the tangent 
cone of / at Xq G R is defined as. 


W/,xo = Eh 


sup h^\ 


we7/(xo),||w||2 = l 

The Gaussian width is a geometric measure of the size of 
the tangent cone. It is similarly defined for any set; the 
definition above is specific to our application. Please refer 
to [1], [2] for detailed discussions on its role in asymptotic 
convex geometry and on its properties. We also need the 
definition of the minimum conic singular value (mCSV) of a 
matrix A. This can be defined for any cone in R". To avoid 
introducing extra notation, we only define it with respect to 
the tangent cone of a function. 

Definition 2.3 (Minimum conic singular value): Let A G 
R™x" xhe minimum conic singular value of A with respect 
to the tangent cone of / at Xq G R" is defined as. 


o-min(A;Tf(xo)) = ^ inf l|Ax|| 2 . 

xGT/(xo).||x||2 = 1 

Note that Cronin (A; R") is the minimum singular value of A. 


B. Results 


Our results hold in the asymptotic linear regime, where 
m,n and all grow to infinity such that mjn ^ 5 ^ 

(0,1) and (1 — e)m > uxj > em for some constant e > 0. 

In particular, assume the setup as in Section II-A.2 under 
this linear regime. Also, let A G R™^" be distributed i.r.o. 
Theorem 2.1 (C-LASSO): Consider (4) and let 


aNSE := lim ||x — XoHl/o"^. 

cr—>^0 


The following limit holds in probability 


lim 

n—¥oo 


aNSE 




\J £ 
/.xo 


^/,xo 


Theorem 2.2 (Minimum Conic Singular Value): Denote 

_ ~ ^/m _ 

•\/n — m n n 

and p := w/ xo/x + n —rn. Eor all C > 0, with probability 1 
in the limit n —>■ 00 , (Tniin(A; 7/(xo)) is lower bounded by 

/ m + - 2px^/,xo - PX^jn - m) _ ^ 

y m-\- p 

C. Remarks 

1) C-LASSO: 

Comparison to Gaussian case: Eor an i.i.d Gaussian matrix 
with entries of variance 1/n, it has been shown in [8] that 
aNSE/n Ri This is strictly greater 

than the expression of Theorem 2.1, proving that the i.r.o. 
ensemble has strictly superior noise performance. Note that 
when w? ^ < m n, the two formulae are close to 

each other. This agrees with the fact that the entries of 
a very “short” i.r.o. matrix are effectively independent for 
many practical purposes [17]. Einally, observe that both 
bounds approach infinity as the number of measurements 
m approaches W/ . Of course, this agrees with the phase 
transition in the noiseless case (cf. (3)) which is same for 
both ensembles. 

Interpretation: As seen the formula of Theorem 2.1 closely 
resembles the corresponding results for the Gaussian case. 
Thus, most of the remarks made for the Gaussian case (e.g. 
[10]) regarding the role of the involved parameters, the geo¬ 
metric nature of the bound and its generality directly transfer 
to our case. It is useful to remark thatW/ admits precise 
high-dimensional approximations either in closed-form, or 
numerically tractable, for a number of useful instances of / 
and Xq, e.g. [1], [2], [8]. Eor a mere illustration, for / = H-Hi 
and Xq fc-sparse signal, < 2fc(log(n/fc) -I- 1). 

wNSE: We conjecture that wNSE = aNSE. In this case. 
Theorem 2.1 would prove a tight upper bound on NSE(tT) 
for any a. Simulation results in Eigure 1 support the claim. 

Universality: [13] shows numerical evidence that partial 
Discrete Cosine Transform (DCT) matrices obtained by 
randomly sampling m rows of the DCT matrix without 
replacement, and similarly sampled Hadamard (HDM) matri¬ 
ces exhibit the same NSE performance as the i.r.o. ensemble. 
Our simulations in Eigure 1 confirm this and, thus. Theorem 
2.1 appears to predict the NSE of random DCT and HDM 
matrices as well. Understanding of the behavior of such 
ensembles is of great practical importance due to their 
favorable attributes [12]. 

2) Minimum conic singular value: 

Comparison to Gaussian case: A standard application of 
GMT shows that the mCSV of a matrix with i.i.d. entries 
^”(0, 1/n) is lower bounded by — u:f/-Jn, e.g. [1, 

Cor. 3.3]. The bound of Theorem 2.2 on the mCSV of an i.r.o. 
exceeds that, which is a strong indication that i.r.o. matrices 
are strictly better conditioned than corresponding Gaussian 
ones. See Eigure 2 for an illustration. 


n 












n = 256, A; = 10 



Fig. 1: Illustration of Theorem 2.1 for / = || ■ ||i and xq G a 10- 
sparse vector. Simulation results support the claim that aNSE = wNSE. 
Furthermore, randomly sampled Discrete Cosine Transform (DCT) and 
Hadamard (HDM) matrices appear to have same NSE performance as i.r.o. 
matrices. Measured values of the NSE are averages over 25 realizations. 


k/n = 10/256 



Fig. 21 Illustration of Theorem 2.2. The bound exceeds the corresponding 
bound for Gaussian matrices. We have chosen / = || • ||i and xq € a 
fc-sparse vector. 


Sanity test: When < m n, the entries of the 

i.r.o. behave almost as if they are independent [17]. As 
expected, then, in this regime the bound of Theorem 2.2 
approaches {y/m — uif/^/n, which coincides with the 
bound on Gaussians. On the other hand, when m = n, it 
can be seen that, as expected, the expression of Theorem 2.2 
approaches one. 

Tightness: Theorem 2.2 provides no guarantees on the 
exactness of the derived lower bound. This is also the case for 
the corresponding result on the mCSV of Gaussian matrices. 
Proving (or disproving) the exactness of the bounds is an 
open research problem. 

General cones: Of course, the bound of Theorem 2.2 holds 
for the minimum singular value of A with respect to any 
cone, not necessarily a tangent cone or even a convex cone. 
One just needs to replace w /.xq with the Gaussian width of 


the corresponding cone. Also, a non-asymptotic version of 
Theorem 2.2 is possible, and will be included in the extended 
version of the paper. 

111. Proof Outline 

In this section, we outline the main steps of the proof. We 
focus on Theorem 2.1. The proof of Theorem 2.2 follows 
along the same ideas and is only briefly discussed in the 
Appendix. Due to space considerations we limit our attention 
to showing the steps and modifications required to apply 
GMT in the case of i.r.o. matrices. In contrast to this part of 
the proof, which involves several new ideas, after we have 
transformed the problem into one where the GMT framework 
is applicable, then the rest is along the lines of [5], [8]-[10]. 
This latter part and some technical details not discussed here 
are deferred to the Appendix and the extended version of the 
paper. We re-write (4) by changing the decision variable to 
be the error vector w := x — Xq; 

w := min || Aw — crv|| 2 . (8) 

wex>/(xo) 

We evaluate the limiting behavior limcr->o IIwIP/ct^. 
Throughout, we write || • || instead of || • || 2 . 

A. Formulation in terms of Gaussians 

We begin with a simple lemma that provides a simple char¬ 
acterization of i.r. orthogonal matrices in terms of Gaussians. 
Let denote a square-root of a matrix X € and 

X“^/^ its inverse (if it exists). Also, for random variables x 
and y with the same distribution, we write x ^ y. 

Lemma 3.1 (I.r. orthogonal matrices): Let G € 
have entries i.i.d. A/^(0,1). Then the matrix A = 
(GG^)“^/^G is a TO X n i.r. orthogonal matrix. 

Proof: It can be readily confirmed that AA^ = !„. 
We need to prove that the distribution of A remains 
invariant after pre- and post- multiplication with 
orthogonal matrices of appropriate sizes. Let $ £ R”^", 
© £ be any orthogonal matrices. First, 

A© ~ (GG'^)-i/2G© == ((G©)(G©)^)-i/2G©. 

Recall that the Gaussian distribution is invariant under 
orthogonal transformations, i.e. G ~ G©, to conclude 
from the above that A© ~ A. Next, G ~ $G. Also, 
it can be directly verified that $(GG^)“^/^$ is the 
inverse of a square-root of of $GG^$. With these, 
A - ((#G)(#G)'^)-i/2#G = $(GG'^)-i/2G = #A. 

■ 

Next, we use Lemma 3.1 to write the objective function in 
(8) in terms of Gaussian matrices. 

Lemma 3.2 (LASSO Objective): Assume A £ 
is i.r. orthogonal and v £ R™ is standard Gaussian, 
independent of each other. Then, for any w £ R”, 

(Aw - crv) ~ (GG^)-i/2G(CTq - w), where G £ R™''" 
and q £ R” have entries i.i.d. Af{0, 1) and are independent 
of each other. 

Proof: Let A, G, v, q as in the statement of the Lemma. 

For any row-orthogonal Q £ R"*^", v ^ Qq. Further¬ 
more, provided that q is independent of the distribution of 

















Q, the same is then true for v. Hence, letting Q = A, 
we have (A-w — crv) ^ A(w — crq). Apply Lemma 3.1 to 
conclude with the desired. ■ 

B. Convex Gaussian min-max Theorem 

We get a handle on (8) and its optimal value via analyzing 
a different and simpler optimization problem, which we call 
Auxiliary Optimization (AO) problem, as in [7]. The machin¬ 
ery that allows this relies on Gordon’s Gaussian min-max 
theorem (GMT) [6, Lem. 3.1]. In fact, we require a stronger 
version of the GMT that can be obtained when accompanied 
with additional convexity assumptions that are not present in 
its original formulation. The fundamental idea is attributed 
to Stojnic [5]. [7] builds upon this and derives a concrete and 
somewhat extended statement of the result in [7, Thm. II. 1]. 
Please refer to [7] for a discussion, on the GMT, the role of 
convexity, and, the differences between [6, Lem. 3.1], [5] and 
[7, Thm. III]. We summarize the main idea of [7, Thm. II. 1] 
in the next few lines. Let G € g K™,h G R" 

have entries i.i.d. Gaussian; 5a C R”,5b C R™ be convex 
compact sets, and -0 : 5a x 5b —R be convex-concave and 
continuous. Consider the two min-max problems in (9) and 
(10) which we refer to as Primary Optimization (PO) and 
Auxiliary Optimization (AO), respectively: 

$(G) := min max b^Ga-f 0(a, b), (9) 

aG»Sa bG»Sb 

0(g, h) := min max ||a||g^b — ||b||h^a-f-(/^(a, b). (10) 

Then, for any p g R, f > 0: 

P (|$(G) -^l\>t)<2F (|0(g, h) - p| > f). 

In words, if the optimal cost of the (PO) concentrates to some 
value p, the same is true for the optimal cost of the (AO). 
Assuming a setup in which the problem dimensions m, n 
grow to infinity it is shown in [7] that if 0(g, h) converges 
in probability to deterministic value d*, then, so does ‘I’(G). 
What is more, if ||a*(g, h)|| converges to say a*, then under 
appropriate strong convexity assumptions on the objective of 
(10), IIa* (G) II converges to the same value. Here, we denote 
a.*(g, h), a.f(G) for the minimizers in (10) and (9). 

C. Deriving the Auxiliary Optimization Problem 

Using Lemma 3.2, we work with the following (proba¬ 
bilistically) equivalent formulation of (8): 

w := min ||(GG^)“^/^G(w — (Tq )||2 (11) 

w6X>/(xo) 

This brings a step closer to the framework of GMT, but not 
yet quite to the point that we can identify the desired format 
of the (PO) as described in (9). The goal of this section is 
to complete this step. We start by using the fact that for any 
a g R™: ||a|| = max||b||<i b^a. In particular, the objective 
function in (11) can be expressed as follows: 

max b^(GG^)-i/ 2 (.^^ _ ) ^ 

l|b||<l 

max b^G(w — aq) = max b^G(w — crq) 
||(GGT)i/2b|l<l ||GTb||<l 


It can be checked that the above is equivalent to; 

max min b^G(w-CTq-£) -f ||f|| 

hi 

Now, we flip the order of max-min [18, Cor. 37.3.2]^: 

w = min maxb^Gfw — crq — .£)-b ||£||, 
wer>/(xo)..« b 

or, re-defining £. := w — crq — £: 

w= min maxb^G£-f llvir — crq — .6||. (12) 

wG'Df{xo),l b 

This brings (8) in the desired format of a (PO) problem^ 
,and, allows us to derive the corresponding (AO) problem: 

w(g,h,q)=arg min max ||£||g^b — ||b||h’^£ 

wGX>/(xo).^ b 

-b ||w-crq-.£||. (13) 

The rest of the proof analyzes (13) with the goal of de¬ 
termining the limiting behavior of ||w|| and is included in 
the Appendix. We just remark here on the assumption of 
the theorem that cr 0; this also provides a hint on the 
precense of the gaussian width of the tangent cone in the 
final result. When ct —0, it suffices to analyze a “first-order 
approximation” to problem (13) in which the feasible set 
27/(xo) is substituted by its conic hull, i.e. 7/(xo). Since the 
tangent cone captures the local behavior in the neighborhood 
of Xq, the relaxation will be tight in the limit as ||w ||2 —>■ 0. 
The idea is that in the limit cr —0, ||w|| is sufficiently small 
and the approximation tight. 
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Appendix 

Here we include a detailed proof of Theorem 2.1. In the 
last section, we provide a short overview of the proof of 
Theorem 2.2 which follows along the same key ideas. 

A. Preliminaries 

We rewrite (1) in a more convenient format for the 
purposes of the analysis. In particular, we perform the 
following operations in the order in which they appear: (i) 
substitute y = Axq + crq, (ii) change the decision variable 
to the quantity of interest, i.e. the normalized error vector 
w := (1/ct)(x — Xq), (iii) move the constraint on w to the 
objective function by introducing a Lagrange multiplier A, 
and, (iv) rescale by a factor of cr. Then, 

w := min max II Aw - q ||2 + -(/(xq + crw) - /(xq)). 
w A>0 cr 

(14) 

We will derive a precise expression for the limiting (as 
n oo) behavior of linio —>0 l|w|| 2 - Note that after the 
normalization of x — xq with ct , it is not guaranteed that 
the optimal minimizer in (14) is bounded (think of ct —0). 
However, we will prove that in the regime of Theorem 2.1 
this is indeed the case. Many of the arguments that we use 
in the analysis require boundedness of the constraint sets. 
To tackle this, we assume that w is bounded by some large 


constant K > 0 (with probability one over A, q), the value 
of which to be chosen at the end of the analysis. Recall that 
at that point we will have a precise characterization of the 
limiting behavior of ||w|| 2 , say a*. If a* turns out to be 
independent on the value of K which we started with, then 
we will assume that this starting value was strictly larger 
than a*. Thus, in what follows, we let K, A, M,... denote 
such (arbitrarily) large positive quantities. Also, throughout 
the proof we write || • || instead of || • || 2 . 

B. The Primary Optimization (PO) 

Using Lemma 3.2, onwards we work with the following 
(probabilistically) equivalent formulation of (14): 

w := min max ||(GG'^)-i/2G(w- q)||2 
||w||<ir A>o 

+-(/(xo+ o-w) -/(xo)). (15) 

CT 

The goal of this section is to bring this in a format for which 
GMT is applicable. We start by using the fact that for any 

aG M™: 

11 all = max b^a. 

I|b||<l 

In particular, the first term in (15) can be expressed as 
follows; to shorten notation denote c := w — q: 

||(GG'^)"^/2Gc|| = max b'^(GG'^)"^/^Gc 
l|b||<l 

= max b^Gc 
||(GGr)i/2b||<l 

= max b^Gc (16) 

||G^b||<l 

= max b^Gc-(5(G^b|F‘-i), (17) 
l|b||<A 

In the last line above, (5(a|S"“^) denotes the indicator 
function of the unit ball, i.e. takes the value 0 if ||a|| < 1 
and + 00 , otherwise. Also, we are allowed to assume that 
b is bounded by some large 0 < A < oo, since the set of 
optima in (16) is a compact set (G^ has full column rank 
with probability one). It can be readily checked (or, see [18] 
that (5(a|i3"“^) = sup^a^^ — ||.£||, for any a G K". Thus, 
continuing from (17): 

||(GG^)-^/2Gc|| = max infb^G(c — £) + ||f|| 
||b|l<A £ 

As a final step, we will flip the order of max-min above. 
This is allowed by [18, Cor. 37.3.2] since: (i) the objective 
function above is continuous, convex in £, and concave in b, 
(ii) the constraint sets are convex, (iii) the set constraining 
the maximization is bounded. Thus, 

||(GG'^)-i/ 2 Q(.|| J^ax b^G(c-£) + p||. 

£ ||b||<A 

We argue that the infimum above is achieved over a bounded 
set. Indeed, performing the maximization over b above 

inf max b^G(c — £) + PH = inf A||G(c — £)\\ + ||.(|| 

£ ||b||<A £ 

The sub-level sets of the (continuous) objective function in 
the minimization on the right-hand side of the equation above 


are clearly bounded. Hence, by Weierstrass’ Theorem [20, 
Prop. 2.1.1] the set of minimum is nonempty and compact. 
We may thus assume there exists large but finite N such 
that constraining the minimization over ||£|| < N does not 
increase the optimum. We may now substitute the above in 
(15) to conclude with; 

w= min max b^G(w — q — £) + ||£|| 

||w||<i<- A>0 

||f||<Ar ||b||<A 

+ -(/(xo + <TW) - /(Xo)). 

a 

or, re-defining £ ■.= w — q — £ and appropriately adjusting 
N: 


w= min max + llw — q — €|| 

llwll^K A>0 

PII<Af l|b||<A 

+-(/(xo+crw) - /(xo)). (18) 

tj 

This brings (14) in the desired format for the application of 
GMT. In particular, identify ■;/;([.£, w], b) := ||w — q — £|| + 
maxA>o §(/(xo + crw) — /(xq)) which is continuous and 
convex in [-6, w], as desired. This format is of course the 
same as in (12), modulo the boundedness constraints which 
were not regarded in the main body of the paper. 

C. The Auxiliary Optimization (AO) for arbitrary a 

Let us write the (AO) problem as it corresponds to (18): 

w(g,h,q)= min max ||£||g^b — ||b||h^.£ 

||wj|<i^ A>0 

||^||<A/ ||b||<A 

+ ||w - q - £\\ + -(/(xo + CTW) - /(xo)). (19) 

a 

Our goal in the rest of the section is to simplify (19). By 
massaging the objective functions and performing minimiza¬ 
tions/maximizations when possible we eventually reach to an 
equivalent formulation, in which most optimizations are in 
terms of scalar variables instead of vectors. Two remarks are 
in place; 

(a) We will need to flip the order of min-max several times; 
except if stated differently we apply [18, Cor. 37.3.2]: here, 
constraint sets will always be convex and the objective 
function continuous. We only need to worry about convexity 
of the objective and boundedness of (at least one of) the 
constraint sets. 

(b) To keep notation short, we will often drop the set 
constraints over the optimization variables when clear from 
context. Recall that most of the constraints are just bound¬ 
edness constraints by constants that can be chosen large. 

1) Maximizing over the direction of b.' This is easy to 
perform, note that max||b||=^ g^b = /3||g||2,/3 > 0. 

2) Minimizing over £: First, let us argue briefly that we 
can “push” the minimization over £ on the right of the 
maximization: (i) it can be seen that after optimizing over 
the direction of b, the objective function in (19) is convex 
in £ (i) it is also concave in A, /3, and (iii) £ is constrained 
in a bounded set. 


To be able to optimize over £, we use the following trick. 
We will express the terms ||£|| and ||w — q — .^|| using the 
fact that: 

y/x = min -f Va; > 0. (20) 

p>o 2p 2 

Also, note that the set of minima above is clearly bounded 
for bounded x. With these. 


min 

e 


g|| - h^£) + ||w - q-.6|| = 
1 „ 


P + 't ^ II ll2 

+ Tj- 

o<p<P 2 2p 
0<t<T 


1 

min — 
e. p 


f-f/32p||g||2 


2t 


ll-^ll + (-/^Ph -I- q - w) £ 


and the minimization over £ contributes the term: 

1 t 


/32p||g|| 


rll - q-w|| . 


3) Linearize f: f is continuous and convex, thus, we 
can express it in terms of its convex conjugate /*(u) = 
supx u^x — /(x). In particular, applying [18, Thm. 12.2] 
we have /(xq -I- crw) = sup^x^u -t- cru^w — /*(u). The 
supremum here is achieved at u* € df{xQ -t- crw) [18, 
Thm. 23.5]. Also, from [20, Prop. 4.2.3], U||w||<a'9/(xo -t- 
crw) is bounded. Thus, the set of maximizers u* is bounded 
and for some 0 < M := M(K) < oo, w is given as the 
solution to 


. P ~\~ t 1 II II2 \ T 

max mm-1-q — w -f Au w 

A>o w,p,t 2 2p 

0</3<A 

||u||<M 


1 t 

2pt + (3^p\\gP 


Pph - 


■q-w||^-f-F(u), 

<7 


( 21 ) 


where we have flipped the orders of min-max for w and u, 
and have denoted 


F(u) := u'^xo - /*(u) - /(xo). 


4) Redefine variables: It will be convenient for the calcu¬ 
lations to follow to redefine the variables /? and t as follows: 

(3 := fip, t := tp and A := \p. 


It can be checked that with these changes, the optimization 
remains convex. 

5) Minimizing over the direction of w.' Evaluating the 
squares in (21) and after some algebra, it can be shown that 
the terms in which w appears are as follows: 


2(/32||g||2+t) 


(f — Au)^w, 


( 22 ) 


where 


f 


( Pt 
\ PP^P + t 


h + 


PV A 
PW + r) 


(23) 












which has entries i.i.d. Gaussians of zero mean and standard 
deviation 


CTf := crf(/3,f) 


/32||g||2 


(24) 


Fix the norm of ||w|| = a. Optimizing over the direction of 
w the second term in (22) gives —a||f — Au||. 

6) Minimize over p: Overall, the min-max problem in (19) 
has reduced itself to: 


max mini —+ qP 
A>0 a,p,t I 2p V 

0</3<A 

||u||<M 


t 

/32||g||2 + t 


ll/5h-qf+ 




max rnin(f + ||qf- 


- 2Q;||f- Au|| +2^F(u)) + |} = 


0</3<A 

llulKM 


PV + t 


ll/3h-q||' 


.^2 _ 2c,||f _ Au|| + 2 -F{nf 


\sr + t 


(25) 


In yielding the equality above, we have applied (20). 

7) Redifine X: It is convenient to redefine A as A := A/ctj. 
Let f denote standard i.i.d. Gaussian vector, such that f ^ 
CTjf. With these, we can express w as the solution to: 

max min (l + ||qf - y,, - qf+ 

0</3<A 

||u||<M 

(26) 


Note that we have essentially considered the square of (26). 
Let us denote the optimal cost of (26) above as := 
'/'(o-; g, h, q, f). 


D. The Auxiliary Optimization in the limit cr —> 0 

[7, Thm. II.2] relates ||w|| to ||w||, under appropri¬ 
ate assumptions. Also, recall that we wish to characterize 
limg—i-o ||w||. Thus, in view of (26) we wish to analyze the 
problem 

00 := 0o(g,h,q,f) := lim 0(cr; g, h, q, f). 

(7^0 

In (26), from Fenchel’s inequality: 

i^(u) = u^Xo -/*(u) -/(xo) < 0. (27) 

With this observation, we prove in the next lemma that 
0((7; g, h, q, f) is non-decreasing in cr. 

Lemma 1.1: Fix g,h,q, f and consider 0(-;g,h,q, f) : 
(0,c») —>■ R as defined in (26). 0(cr; g, h, q, f) is non¬ 
decreasing in a. 

Proof: Denote £(ct, a, f,/3, u, A) the objective func¬ 
tion in (26) and consider 0 < cti < (T 2 < oo. 

Let be an optimal solution to the min-max 

problem in (26) for 02 - Then, let A^^^) = 

arg max/ 3 ^u.A , 0, u, A). Clearly, 


Using l/(Ti > 1/(72 and (27), 

£(cti, A^^^) < 

£((72, A^^^) 


But, 

£((72,a*^^\f*^^\/3^^\u(^\ A^^^) < 0((72). 


Combine the above chain of inequalities to conclude. ■ 
In particular, when viewed as a function of n := 1/(7, 
0(-; g, h, q, f) is non-increasing. Thus, 


00 = lini 0((7) = lim 0(/t) = inf 0 (k), (28) 

O’—^■O K^OO K>0 


Next, we argue that we can flip the order of min-max. The 
objective function in (26) is continuous, convex in k, and, 
concave in A,0,u. The constraint set on A appears to be 
unbounded, but, it can be checked from (26) that the optimal 
value is in fact bounded. With this and (28), we get 

S 'S?S'o (‘ + - 011 "+ 

0<^<A 

||u||<M 

_ 2afa\\i - Au|| + ^2(7^AU(u)) . 

Recall (27) and the fact that equality is achieved iff u G 
df(xo) (e.g. [18, Thm. 23.5]). Then, 0o is given by 


max min(t+||q|i^- ^ J ||;9h-qf+ 

0<;S<A 

uedf(xo) 


n^r 2 

pv+t 


2(7fa||f- Au||y 


where we have assumed 00 > M > maXsga/(xo) ||s||. 
We can now optimize over Au (after appropriately flip¬ 
ping the order of min-max): min>,>o,uGa/(xo) l|f ~ Au|| = 
dist(f, cone{df {xq))). Thus, we conclude with the (AO) for 
CT —0 taking the form: 


0o(g,h,q,f)= min £(a;g,h,q,f), 
0<a<K 

£(a;g,h,q,f) := min max U-|-||q|p- 

0<t<T0</3<A I 


(29) 


|g|r+ i 


ll/3h-qf + 


0^l|g||^ 

/32||g||2 


-a^ - 2(7. 


fadhj, 


where we have denoted dh := dist(f, cone(c)/(xo))). 

It is now easy to optimize (31) over a. We summarize the 
result in the following lemma. 

Lemma 1.2: In (31), fix g,h,q and let w := w(g,h,q) 
be optimal. Denote, 


f 




Pt 

02||g||2 


h + 


p^g\\^ + r^ 


and v{P,t) := dist(f, cone(A5/(xo))). Then, 


0l|gP 


( 30 ) 



















where t are optimal solutions to the following optimiza¬ 
tion: 


max min 
A>/3>0T>t>0 



t 

^2||g||2_pt 


ll/3h-qf+ 


Note that f 


MHPHsP + t) 



(31) 


(Tjf where f is standard i.i.d. Gaussian and 


CTf := crf(/3,f) ■■= + t)- 


d(t^:,/3t) = 0. Next, suppose G = m. Substituting this in 
(33a) we find 

{n — m)l3^ -I- ((n — m) — {m — a;^))/3j — (m — = 0. 

Solving this, yields /3j = = m—uj'^ > 0. 

Choose, A,r such that are feasible. Form convexity, 

first-order optimality conditions are sufficient. 

What is left is to substitute those limit values /3* and G 
in (30) in Lemma 1.2, to conclude with 


E. Probabilistic Analysis 

Lemma 1.2 derives an expression for ||w||, for fixed 
g, h, q. Here, we evaluate the limiting behavior of this 
expression. Recall that g, h, q are all i.i.d. standard Gaussian 
vectors and assume the large-system limit linear regime as in 
the statement of Theorem 2.1. We use the following notation: 
let {Xn}^^i be sequence of random variables and {c„} a 

P 

deterministic sequence, then Xn —> Cn iff for all e > 0, the 
event |X„ —c„| < ec„ occurs w.p. 1 in the limit n ^ oo. For 
the purpose of this section, convergence is to be understood 
in the aforementioned meaning. 

From standard concentration results on Gaussian r.v.s.: 
||gf A m, ||hf A n, ||qf A n, ||/3h - aqf A 
(/3^-|-cr^)n, and Si h, v) —crjW/^xo- For the last 

relation, we have used the property of the gaussian width as 
in [2, Prop. 10.1]. Hence, for any fixed /3,t, the objective 
function in (31) converges to 


d{l3,t) =t + 


— t) 

fPm +1 


+ P'^rn? 2 
m(/3^m -I- 


(32) 


It can be checked that the objective function in (31) is convex 
in t and concave in /?. Also, the constraint sets are compact. 
Thus, it follows from [21, Cor. II. 1] ( “point-wise conver¬ 
gence in probability of concave functions implies uniform 
convergence in compact spaces” ) that the convergence in 
(32) is uniform over /3 and t. As will be shown next, provided 
that the constants determining the constraint sets are large 
enough, then there exist unique /3* and L that are optimal 
in (32). Hence, as in [22, Thm. 2.7], the optimal solutions of 
(31) indeed converge to the deterministic solutions of (32), 
which we calculate below. Let the constant bounds on the 
variables /3, t, namely A, T, to be specified later. Denote 
/3*, L optimal solutions in 


max min diB.t). 

0<P<B0<t<T 

Let us write w := wj xo- We differentiate the objective with 
respect to both /? and t to find: 

dd{P^,U) _ ^ - 1) tl + - m^jdl b? 

dX + Plm) ” + m 

(33a) 

dd{(3^,U) 2/3*G(m-f*) 2 n /ooun 

(M’n+ur 

Setting them to zero, from (33b) we have /I* = 0, L = 0 or 
= am. We consider each case separately. Assume /3* = 0, 
then L = argmin(i(A,/?*) = argminA — A^ = 0 and 


||Wf a Axo(»^ - A.xo)/"^ - A.xo- 

E Proof Outline of Theorem 2.2 

In the next few lines we outline only the main checkpoints 
involved in the proof of Theorem 2.1. The analysis follows 
along the same lines as in Sections B, C and E for the proof 
of Theorem 2.1. In fact, things here are less involved since 
we are only interested in lower bounding the optimal cost of 
a min-max problem, and don’t care about its optimal values. 
Hence, a single application of GMT, and not the framework 
of [5], [7] requires to be employed. A detailed proof will be 
included in the future extended version of the paper. 

Denote, C := 7>(xo) n 5"-i. We write the mCSV of A 
as 

crmin(A;7/(xo)) = min max y^Ax. (34) 
xeC I|y|i<l 

We prove a high-probability lower bound on the optimal 
cost of this min-max optimization. We do so by applying 
Gordon’s GMT, just as is done in the Gaussian case. But first, 
we need to bring (34) in a format where GMT is applicable 
After replicating the ideas of Section B and applying GMT, 
it can be shown that it suffices to lower bound the optimal 
cost of the following (AO) problem instead: 

min max llx — .£11-|-/3(||£|| ||g|| — h^£). (35) 

xGC.£B>/3>0 

Next, as in Section C we perform a deterministic (fixed 
g, h) analysis of this to simplify it as possible into a scalar 
optimization problem. Only caution should be taken here that 
the constraint set C on x is non-convex, thus we are not 
allowed to flip min-max operations “carelessly”. It can be 
shown that (36) has optimal cost sfF, where F := F’(g,h) 
is the optimal cost of the following optimization: 

||g||2 - f^2||h||2 - 2f/3h^X-bf/32||g||2 +^2^2 

min max -r—- 

xGC,T>t>0B>/3>0 ||g|r+f 

(36) 

After applying the min-max inequality (e.g. [18, 

Lemma 36.1]) it is easy to optimize over x by choosing it 
to maximize in C and F is the optimal cost to only 
a scalar optimization problem involving the r.v.s. ||g||, ||h|| 
and maxxgc h^x. All three, are 1-Lipschitz functions, thus, 
they concentrate (thus, converge in the proportional regime) 
to their mean values ^/m, y/n and ujf.xo respectively. 

Also, the problem is convex in /3, t, thus we can yield 
the expression of Theorem 2.2 (with the correspondence 
/3 X, f O p), by first-order optimality conditions. 














